Skip to content

Comments

fix: add shlex fallback parser and heredoc warning#127

Merged
leonvanzyl merged 1 commit intoAutoForgeAI:masterfrom
cabana8471-arch:fix/security-shlex-fallback
Jan 29, 2026
Merged

fix: add shlex fallback parser and heredoc warning#127
leonvanzyl merged 1 commit intoAutoForgeAI:masterfrom
cabana8471-arch:fix/security-shlex-fallback

Conversation

@cabana8471-arch
Copy link
Contributor

@cabana8471-arch cabana8471-arch commented Jan 29, 2026

Summary

Two focused bug fixes discovered from production log analysis:

  1. shlex fallback parser - Complex docker commands with nested quotes (e.g., docker exec container php -r "echo \"test\";") caused shlex.split() to fail, which previously returned an empty list and blocked the entire command. Now extracts the primary command via fallback parsing.

  2. heredoc warning - Sandbox mode blocks /tmp access, causing heredoc-style git commits to fail with "can't create temp file for here document". Added clear guidance to use multiple -m flags instead.

Changes

File Change
security.py Add _extract_primary_command() fallback when shlex fails
.claude/templates/coding_prompt.template.md Add heredoc warning with alternative patterns

Test plan

  • extract_commands('docker exec container php -r "echo \"test\";"') returns ['docker'] (was [])
  • All 162 security unit tests pass
  • ruff check security.py passes

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Documentation

    • Updated development commit message guidelines with comprehensive rules, examples, and formatting standards for multi-line messages.
  • Bug Fixes

    • Improved command extraction resilience with fallback mechanisms to gracefully handle malformed or complex segments.

✏️ Tip: You can customize this high-level summary in your review settings.

- Add _extract_primary_command() fallback when shlex.split() fails on complex nested quotes (e.g., docker exec with PHP)

- Returns primary command instead of empty list, allowing valid commands to proceed

- Add heredoc warning to coding prompt - sandbox blocks /tmp access for here documents

- All 162 security tests pass
@coderabbitai
Copy link

coderabbitai bot commented Jan 29, 2026

📝 Walkthrough

Walkthrough

Two focused updates improve commit message guidance and command extraction robustness. The template file now includes detailed git commit rules with examples, while the security module gains a fallback helper function to handle malformed command segments gracefully instead of failing.

Changes

Cohort / File(s) Summary
Documentation & Guidelines
.claude/templates/coding_prompt.template.md
Added comprehensive "Git Commit Rules" section with specific guidance on flag usage, heredoc limitations, and multi-line message formatting. Expanded examples demonstrate both multi-line and single-message commit workflows.
Command Extraction Resilience
security.py
Added new _extract_primary_command() helper function to heuristically extract primary commands from malformed segments. Modified extract_commands() to use this fallback when shlex.split() fails, improving robustness without returning empty results.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Poem

🐰 Git commits now shine with clarity,
Rules and examples, no disparity!
Commands stumble? No need to fear—
Fallback extraction keeps paths clear!
hops merrily 🥕

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately reflects both main changes: adding a shlex fallback parser (in security.py) and a heredoc warning (in the coding prompt template).
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@security.py`:
- Around line 211-217: In the except ValueError block that currently calls
_extract_primary_command(segment) and appends fallback_cmd to commands, add an
explicit security-level log entry (including timestamp, the original segment,
the fallback_cmd, and context like caller or request id if available) whenever
fallback extraction is used; ensure the logger used is the module/application
security/audit logger and that the message clearly flags this as a fallback
extraction for malformed input so these events are auditable and easy to search
later.
🧹 Nitpick comments (1)
security.py (1)

198-198: Minor note: consider removing the comment.

The comment # (re is already imported at module level) is helpful during review but may become stale or add noise over time. Consider removing it since the import at line 10 is self-evident.

Comment on lines 211 to +217
except ValueError:
# Malformed command (unclosed quotes, etc.)
# Return empty to trigger block (fail-safe)
return []
# Try fallback extraction instead of blocking entirely
fallback_cmd = _extract_primary_command(segment)
if fallback_cmd:
commands.append(fallback_cmd)
continue
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's locate the security.py file and understand the structure
find . -name "security.py" -type f | head -5

Repository: leonvanzyl/autocoder

Length of output: 77


🏁 Script executed:

# Find test files related to security
find . -name "*test*.py" -o -name "test_*" -type f | grep -i security | head -10

Repository: leonvanzyl/autocoder

Length of output: 113


🏁 Script executed:

# Search for the _extract_primary_command function and related code
rg -n "_extract_primary_command|extract_commands" --type=py | head -20

Repository: leonvanzyl/autocoder

Length of output: 637


🏁 Script executed:

# Search for the bash_security_hook and understand the fail-safe mechanism
rg -n "bash_security_hook|ValueError.*shlex" --type=py -A 3 -B 3

Repository: leonvanzyl/autocoder

Length of output: 11785


🏁 Script executed:

# Read the security.py file to understand the fallback mechanism and the surrounding context
cat -n security.py | sed -n '143,220p'

Repository: leonvanzyl/autocoder

Length of output: 2848


🏁 Script executed:

# Look at the blocklist and how it's enforced
rg -n "HARDCODED_BLOCKLIST|BLOCKLIST|sudo|dd|shutdown" security.py -A 2 -B 2 | head -50

Repository: leonvanzyl/autocoder

Length of output: 1513


🏁 Script executed:

# Examine the test_extract_commands function to see what's tested
cat -n test_security.py | sed -n '97,150p'

Repository: leonvanzyl/autocoder

Length of output: 2292


🏁 Script executed:

# Search for tests with malformed/unclosed commands
rg -n "unclosed|malformed|shlex|ValueError" test_security.py -A 3 -B 2

Repository: leonvanzyl/autocoder

Length of output: 46


🏁 Script executed:

# Check if there are any tests with chained commands and malformed input
rg -n "&&|;|pipe.*unclosed|malformed.*&&" test_security.py -A 5 -B 5

Repository: leonvanzyl/autocoder

Length of output: 2062


🏁 Script executed:

# See how the blocklist is enforced in bash_security_hook
cat -n security.py | sed -n '787,850p'

Repository: leonvanzyl/autocoder

Length of output: 3020


🏁 Script executed:

# Look at the full validate_command function to understand blocklist enforcement
rg -n "def validate_command|BLOCKED_COMMANDS|DANGEROUS_COMMANDS" security.py -A 5 | head -100

Repository: leonvanzyl/autocoder

Length of output: 1453


🏁 Script executed:

# Check if there are any tests with malformed input (unclosed quotes)
rg -n "unclosed|malformed|shlex.*fail|ValueError" test_security.py

Repository: leonvanzyl/autocoder

Length of output: 46


🏁 Script executed:

# Look for tests that specifically test the fallback mechanism
rg -n "fallback|_extract_primary_command" test_security.py

Repository: leonvanzyl/autocoder

Length of output: 46


🏁 Script executed:

# Search entire codebase for any tests with malformed commands containing chained operations
rg -n '".*&&.*".*unclosed|unclosed.*&&' test_security.py test_security_integration.py

Repository: leonvanzyl/autocoder

Length of output: 46


🏁 Script executed:

# Let me check if there are any tests with malformed commands in general
rg -n 'unclosed|malformed|".*\\$' test_security.py test_security_integration.py -A 3 -B 1

Repository: leonvanzyl/autocoder

Length of output: 46


🏁 Script executed:

# Check if the fallback extraction is tested with any dangerous commands
rg -n 'sudo|dd|shutdown' test_security.py -A 2 -B 2

Repository: leonvanzyl/autocoder

Length of output: 1638


🏁 Script executed:

# Let's trace through a specific test case with chained commands to understand the extraction
cat -n test_security.py | sed -n '1075,1130p'

Repository: leonvanzyl/autocoder

Length of output: 660


🏁 Script executed:

# Check if there are any edge case tests with malformed input
rg -n "test.*malformed|test.*unclosed|test.*fallback" test_security.py -A 10

Repository: leonvanzyl/autocoder

Length of output: 46


🏁 Script executed:

# Let me examine the complete extract_commands function to see all code paths
cat -n security.py | sed -n '182,280p'

Repository: leonvanzyl/autocoder

Length of output: 3457


🏁 Script executed:

# Let me check what the code did BEFORE the fallback was introduced
# by looking at git history to see the original behavior
git log --oneline -20 -- security.py

Repository: leonvanzyl/autocoder

Length of output: 122


🏁 Script executed:

# Check if there are any tests that specifically test the behavior when extract_commands returns empty
rg -n "extract_commands.*\[\]|empty.*commands|extract.*fail" test_security.py -B 3 -A 3

Repository: leonvanzyl/autocoder

Length of output: 46


🏁 Script executed:

# Let's create a test case to verify the concern - malformed input with chained dangerous commands
python3 << 'EOF'
import sys
sys.path.insert(0, '.')

from security import extract_commands, _extract_primary_command

# Test case 1: Malformed segment with chained dangerous command
test_cases = [
    # Normal cases
    ('docker exec php -r "echo test"', "normal docker with unclosed quote simulation"),
    # Malformed with chained command
    ('valid_cmd "unclosed && sudo rm -rf /', "chained dangerous after unclosed quote"),
    ('ls "unclosed && dd if=/dev/zero', "dd chained after unclosed quote"),
    ('echo "test && shutdown -h now', "shutdown chained after unclosed quote"),
]

print("Testing extract_commands with malformed + chained dangerous commands:\n")
for cmd, description in test_cases:
    result = extract_commands(cmd)
    print(f"Input: {cmd!r}")
    print(f"Description: {description}")
    print(f"Extracted commands: {result}")
    print()
EOF

Repository: leonvanzyl/autocoder

Length of output: 631


🏁 Script executed:

# Check if the fallback was recently added or if it was always there
git log -p -- security.py | grep -A 10 -B 10 "_extract_primary_command" | head -50

Repository: leonvanzyl/autocoder

Length of output: 46


🏁 Script executed:

# See when the fallback logic was introduced
git log --oneline --all | head -5

Repository: leonvanzyl/autocoder

Length of output: 402


🏁 Script executed:

# Check what happens when extract_commands returns empty list in bash_security_hook
cat -n security.py | sed -n '808,820p'

Repository: leonvanzyl/autocoder

Length of output: 574


🏁 Script executed:

# Let me test what happens with a pure malformed command (no valid first command)
python3 << 'EOF'
import sys
sys.path.insert(0, '.')

from security import extract_commands

# Test cases where malformed has NO valid command
test_cases = [
    ('&&& malformed', "operators only"),
    ('"unclosed only', "just unclosed quote"),
    ('; ; ;', "only operators"),
    ('|||', "only pipes"),
]

print("Testing extract_commands with ONLY malformed syntax:\n")
for cmd, description in test_cases:
    result = extract_commands(cmd)
    print(f"Input: {cmd!r}")
    print(f"Description: {description}")
    print(f"Extracted commands: {result}")
    print(f"Length: {len(result)}")
    print()
EOF

Repository: leonvanzyl/autocoder

Length of output: 451


🏁 Script executed:

# Check the commit that introduced the fallback to understand the original behavior
git show d47028d --stat

Repository: leonvanzyl/autocoder

Length of output: 11619


🏁 Script executed:

# Let's see the actual changes made in that commit
git show d47028d

Repository: leonvanzyl/autocoder

Length of output: 50380


🏁 Script executed:

# Check what tests exist for malformed input and fallback
rg -n "test.*fallback|test.*malformed|test.*unclosed|test.*shlex" test_security.py -A 5 -B 2

Repository: leonvanzyl/autocoder

Length of output: 46


🏁 Script executed:

# Verify that all BLOCKED_COMMANDS and DANGEROUS_COMMANDS are in the blocklist enforcement
cat -n security.py | sed -n '72,115p'

Repository: leonvanzyl/autocoder

Length of output: 1268


🏁 Script executed:

# Verify the blocklist is actually enforced in bash_security_hook
cat -n security.py | sed -n '834,860p'

Repository: leonvanzyl/autocoder

Length of output: 1541


🏁 Script executed:

# Test if a dangerous command can bypass when malformed
python3 << 'EOF'
import sys
sys.path.insert(0, '.')

from security import extract_commands, get_effective_commands

# Test if sudo/dd can bypass in malformed segments
malformed_tests = [
    'valid_cmd "unclosed && sudo apt install',
    'echo test; sudo rm -rf /',
    'ls "unclosed; dd if=/dev/zero',
]

dangerous_set = {"sudo", "dd", "shutdown", "reboot"}

print("Testing if dangerous commands extracted from malformed segments:\n")
for cmd in malformed_tests:
    extracted = extract_commands(cmd)
    has_dangerous = any(c in dangerous_set for c in extracted)
    print(f"Input: {cmd!r}")
    print(f"Extracted: {extracted}")
    print(f"Contains dangerous: {has_dangerous}")
    print()

# Check the blocklist
allowed, blocked = get_effective_commands(None)
print("Dangerous commands in blocklist:")
for cmd in ["sudo", "dd", "shutdown", "reboot", "su"]:
    in_blocked = cmd in blocked
    print(f"  {cmd}: {in_blocked}")
EOF

Repository: leonvanzyl/autocoder

Length of output: 507


🏁 Script executed:

# Let me verify the actual security concern more thoroughly
# Test the specific example from the review comment
python3 << 'EOF'
import sys
sys.path.insert(0, '.')

import asyncio
from security import bash_security_hook

test_cases = [
    # The example from the review comment
    'docker exec php -r "unclosed && sudo rm -rf /',
    # Variations
    'valid_cmd "unclosed && sudo apt install package',
    'echo test; sudo rm -rf /',  # This should work (extracted)
    'ls "unclosed && shutdown -h now',
]

async def test_command(cmd):
    input_data = {"tool_name": "Bash", "tool_input": {"command": cmd}}
    result = await bash_security_hook(input_data)
    decision = result.get("decision", "allow")
    reason = result.get("reason", "allowed by default")
    return decision, reason

print("Testing bash_security_hook with malformed + dangerous commands:\n")
for cmd in test_cases:
    decision, reason = asyncio.run(test_command(cmd))
    print(f"Command: {cmd!r}")
    print(f"Decision: {decision}")
    if decision == "block":
        print(f"Reason: {reason[:100]}...")
    print()
EOF

Repository: leonvanzyl/autocoder

Length of output: 586


🏁 Script executed:

# Check if there are any tests currently for the fallback behavior
grep -r "fallback\|_extract_primary_command" test_security.py test_security_integration.py

Repository: leonvanzyl/autocoder

Length of output: 46


🏁 Script executed:

# Look for any TODO or FIXME comments about the fallback
rg -n "TODO|FIXME|XXX" security.py | head -20

Repository: leonvanzyl/autocoder

Length of output: 46


🏁 Script executed:

# Let me trace through the logic more carefully for the docker case
python3 << 'EOF'
import sys
sys.path.insert(0, '.')

from security import extract_commands, is_command_allowed, get_effective_commands

# The critical case
cmd = 'docker exec php -r "unclosed && sudo rm -rf /'
print(f"Testing: {cmd!r}\n")

# Step 1: What gets extracted?
extracted = extract_commands(cmd)
print(f"Step 1 - Extracted commands: {extracted}")

# Step 2: Is docker allowed?
allowed, blocked = get_effective_commands(None)
docker_allowed = is_command_allowed("docker", allowed)
print(f"Step 2 - Is 'docker' allowed? {docker_allowed}")
print(f"         Docker in allowed: {'docker' in allowed}")
print(f"         Docker in blocked: {'docker' in blocked}")

# Step 3: Was sudo extracted?
sudo_extracted = "sudo" in extracted
print(f"Step 3 - Was 'sudo' extracted? {sudo_extracted}")
print(f"         Sudo in blocked: {'sudo' in blocked}")

print(f"\nResult: Command ALLOWED because docker is allowed and sudo was NOT extracted")
print(f"Risk: sudo is present in the command string but never validated")
EOF

Repository: leonvanzyl/autocoder

Length of output: 473


🏁 Script executed:

# Check what's in the default allowlist for docker
rg -n "docker" security.py | head -20

Repository: leonvanzyl/autocoder

Length of output: 168


🏁 Script executed:

# Verify the default allowed commands
cat -n security.py | sed -n '35,70p'

Repository: leonvanzyl/autocoder

Length of output: 1116


Add logging when fallback command extraction is used to ensure security-critical misses are auditable.

The fallback extraction for malformed segments (e.g., docker exec php -r "unclosed && sudo rm -rf /) can allow dangerous commands to bypass validation. In the example above:

  • docker is extracted and allowed
  • sudo inside the malformed quote is never extracted or validated
  • The command passes the security check despite containing a blocked command

While such malformed input typically fails shell parsing as well (defense-in-depth), this creates an unauditable security blind spot. The original fail-safe behavior blocked unparseable commands entirely; the new behavior extracts only the primary command and allows it if safe.

Consider adding explicit logging (with timestamp and command context) when fallback extraction is triggered, so security-sensitive operations are auditable. Alternatively, document this tradeoff as an accepted risk in the project's security model.

🤖 Prompt for AI Agents
In `@security.py` around lines 211 - 217, In the except ValueError block that
currently calls _extract_primary_command(segment) and appends fallback_cmd to
commands, add an explicit security-level log entry (including timestamp, the
original segment, the fallback_cmd, and context like caller or request id if
available) whenever fallback extraction is used; ensure the logger used is the
module/application security/audit logger and that the message clearly flags this
as a fallback extraction for malformed input so these events are auditable and
easy to search later.

@leonvanzyl leonvanzyl merged commit 77b91ca into AutoForgeAI:master Jan 29, 2026
3 checks passed
@leonvanzyl
Copy link
Collaborator

Thanks

leonvanzyl added a commit that referenced this pull request Jan 29, 2026
- Add debug logging when shlex fallback extraction is used, capturing
  both successful extractions and failures for security auditing
- Add test case for docker nested quotes that trigger fallback parser
- Remove redundant comment about re import (already at module level)

Follow-up improvements from PR #127 code review:
- Enables tracking of malformed command patterns in production logs
- Verifies fallback parser handles the exact docker exec case reported

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@cabana8471-arch cabana8471-arch deleted the fix/security-shlex-fallback branch January 29, 2026 16:35
rudiheydra added a commit to rudiheydra/AutoBuildr that referenced this pull request Feb 2, 2026
…ture AutoForgeAI#127)

- 63 tests verifying all 9 feature steps for FeatureCompiler.compile()
- Validates task_type mapping, tool_policy, budgets, acceptance_spec, traceability
- Confirms 5 distinct task_types across multi-feature DB run (coding, testing, audit, documentation, refactoring)
- Zero regressions in existing 46-test E2E suite

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
CoreAspectStu pushed a commit to CoreAspectStu/autocoder-custom that referenced this pull request Feb 9, 2026
…shlex-fallback

fix: add shlex fallback parser and heredoc warning
CoreAspectStu pushed a commit to CoreAspectStu/autocoder-custom that referenced this pull request Feb 9, 2026
- Add debug logging when shlex fallback extraction is used, capturing
  both successful extractions and failures for security auditing
- Add test case for docker nested quotes that trigger fallback parser
- Remove redundant comment about re import (already at module level)

Follow-up improvements from PR AutoForgeAI#127 code review:
- Enables tracking of malformed command patterns in production logs
- Verifies fallback parser handles the exact docker exec case reported

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants